Diabetes Type II Indian dataset project

Group 1

Members

  • Anna Lifousi (s232979)
  • Jordan Sylvester Fernandes (s222497)
  • Manuel Arcieri (s230158)
  • Quim Bech Vilaseca (s233374)
  • Xavier Viñas Margalef (s233532)

Introduction

  • Diabetes is estimated to affect approximately 530 million adults worldwide, with a global prevalence of 10.5 percent among adults aged 20 to 79 years. 1

  • Type 2 diabetes represents approximately 98 percent of global diabetes diagnoses, although this proportion varies widely among

  • Evaluate the possible factors that affect the appearance of Diabetes Type 2, for further control and prevention.

Materials and Methods

Data Acquisition and Description

Materials and Methods

Data Cleaning, Augmentation and Analysis

  • Data Loading: Automatic data loading from server.
  • Data Cleaning: Several zero values treated as NA. In the case of Insulin, only treated as NA if the patient was diabetic.
  • Data Augmentation: Columns renamed to improve clarity and new categories created for BMI and Age.
  • Data Description & Analysis: Data analyzed and visualized statistically and graphically. Use of Principal Component Analysis to understand correlation between variables.

Results

Correlation heatmap

  • Expected strong correlation between BMI and skin thickness, since skin layer thickness increases with BMI increase
  • Weak correlation between insulin and glucose levels, possibly since insulin levels increase when glucose levels rise in the blood durng the OGTT.

Principal component analysis

  • After having scaled and centred the data, we performed PCA to test the correlation between multiple properties in a two-dimensional space.

Principal component analysis

Principal component analysis

  • As we can see, there’s no clear separation between the two classes using the two best principal components.

  • The reason can be traced back to how much variance is explained by each principal component.

Principal component analysis

Principal component analysis

  • The first two principal components only account for around 50% of the total variance.

  • To reach at least 90%, we would have to include 6 PC out of 8.

Discussion